在用flask-sqlalchemy对一个千万级别表进行count操作时,出现了耗时严重、内存飙升的情况。
要统计出一天内车辆访问次数,原代码如下:

car_visit_counts = CarVisit.query.filter(
    CarVisit.park == car_visit.park,
    CarVisit.plate_number == car_visit.plate_number,
    CarVisit.visited_at >= today_start_time(),
).count()

发现代码运行特别慢,所以把生成的sql打印出来看一下:

SELECT
          COUNT(*) AS count_1
FROM
          (
                    SELECT
                              car_visits.id                            AS car_visits_id
                            , car_visits.park_id                       AS car_visits_park_id
                            , car_visits.store_id                      AS car_visits_store_id
                            , car_visits.car_id                        AS car_visits_car_id
                            , car_visits.brand_id                      AS car_visits_brand_id
                            , 
                            ...
                    FROM
                              car_visits
                    WHERE
                              %(param_1)s                 = car_visits.park_id
                              AND car_visits.plate_number = %(plate_number_1)s
                              AND car_visits.visited_at  >= %(visited_at_1)s
          )
          AS anon_1

可以发现进行了一次子查询,这样的话会生成临时表,效率低下,将原语句改变一下:

car_visit_counts = db.session.query(func.count(CarVisit.id)).filter(
    CarVisit.park == car_visit.park,
    CarVisit.plate_number == car_visit.plate_number,
    CarVisit.visited_at >= today_start_time(),
).scalar()

此时在看一下打印的sql语句:

SELECT
          COUNT(car_visits.id) AS count_1
FROM
          car_visits
WHERE
          %(param_1)s                 = car_visits.park_id
          AND car_visits.plate_number = %(plate_number_1)s
          AND car_visits.visited_at  >= %(visited_at_1)s

子查询消失了,速度也快了好多。


ralap
6 声望1 粉丝

不撸代码的车手不是一个好公路车手。